Research
Human Detection of AI-Generated Phishing
An ongoing study into which phishing techniques humans miss most when AI partially standardizes linguistic quality across conditions. Data is collected through Threat Terminal, a game-based research platform where players classify emails, bet confidence, and earn XP while contributing to a real dataset.
Study Protocol
Altiparmak, S. (2026). Human Detection of AI-Generated Phishing: Study Protocol and Dataset Design for the Threat Terminal Experiment (v1.1). Zenodo.
doi.org/10.5281/zenodo.19156047 →Preprint (not peer-reviewed) · CC BY 4.0 · Published March 22, 2026
Why This Matters
Most published research on phishing detection focuses on automated filtering rather than human judgment. Security awareness training still teaches heuristics that assume poor writing quality, an assumption AI has already invalidated. The question practitioners actually need answered, which attack techniques bypass trained humans when the writing is no longer the tell, does not have good empirical data behind it yet.
That gap matters more now than it did two years ago. AI-generated phishing is compressing the skill gap between low-effort campaigns and targeted social engineering, and federal cybersecurity strategy is starting to reflect that shift. Financial services, where email-based attacks account for a disproportionate share of breaches, is particularly exposed. This study is designed to produce the kind of data that informs both training programs and detection strategy.
Research Question
The dominant heuristics for identifying phishing have historically been linguistic: look for grammar errors, awkward phrasing, unusual idioms, and formatting inconsistencies. These signals held up because real phishing campaigns were written sloppily, often by non-native speakers working at volume. That era is over. AI-generated phishing is grammatically flawless, contextually plausible, and available at negligible marginal cost.
The study question is: when linguistic quality is held constant across all emails, phishing and legitimate alike, which phishing techniques produce the lowest human detection rates?
Technique is the only independent variable. Every card in the dataset, phishing and legitimate, was generated by an AI model. This controls for writing quality and removes it as a confound. What remains are the structural and contextual properties of each technique: how it frames the request, what authority it invokes, what urgency it creates, and whether it establishes a plausible backstory.
Secondary questions: Does professional security background improve detection rates for specific techniques, or does it improve detection uniformly? Does overconfidence correlate with specific technique failures? Do security professionals show lower bypass rates than technical non-security users, or does security experience not predict detection accuracy as well as we assume it should?
Dataset
1,000
Total cards
690
Phishing cards
310
Legitimate cards
6
Techniques studied
Phishing cards (690)
Six techniques, 115 cards each. Each technique block is split into four difficulty tiers to ensure the dataset captures a realistic range of attack sophistication rather than clustering at a single difficulty level. Forensic metadata also varies by tier: easy and medium cards default to failed email authentication (SPF/DKIM/DMARC), while hard and extreme cards may present verified or ambiguous authentication status, removing header analysis as a reliable shortcut.
| Difficulty | Cards per technique |
|---|---|
| Easy | 35 |
| Medium | 35 |
| Hard | 35 |
| Extreme | 10 |
Legitimate cards (310)
Legitimate cards cover three real-world email categories. Including a realistic volume of legitimate emails ensures players cannot gain an edge by defaulting to phishing classifications, and that false positive rates are measurable.
| Category | Cards |
|---|---|
| Transactionalreceipts, shipping, account updates | 110 |
| Marketingnewsletters, promotions, announcements | 100 |
| Workplaceinternal comms, HR, IT notices | 100 |
The dataset is frozen at v1 once 1,000 approved cards are reached. All cards go through an admin review pipeline before going live: generated using two Claude models (approximately 80% Claude Haiku 4.5, 20% Claude Sonnet 4.6), with per-card model provenance recorded for post-hoc analysis of whether model choice affects detection rates. Cards are staged for review, then approved or rejected by a human reviewer. Cards are not added to the live dataset without review.
The platform includes a card reporting feature. Participants can flag any card during classification. A reported card is excluded from primary analysis only if review confirms one of three pre-registered criteria: incorrect ground truth, internal inconsistency between forensic metadata and email content, or a rendering or display error. Reports that do not meet these criteria do not result in exclusion.
Phishing Techniques Studied
Each technique represents a distinct social engineering mechanism. They were selected because they map to real-world attack patterns documented in threat intelligence reporting, and because they make different cognitive demands on the classifier. Technique is the only independent variable in the study. Everything else: prose quality, email structure, and presentation is held constant.
Urgency
Emails that manufacture artificial time pressure to force fast, unconsidered decisions. These messages typically invoke expiring accounts, unprocessed payments, immediate action requirements, or looming security events. The tell is structural: the email wants you to act before you think. In a controlled quality environment, where the prose is polished and the framing is plausible, urgency becomes harder to isolate as a signal because it also appears in legitimate transactional emails: password resets, shipping updates, calendar reminders.
Authority Impersonation
Messages that impersonate a figure whose instructions carry implicit compliance pressure: executives, IT departments, HR, legal teams, government agencies, or established institutions. The attack exploits deference. Recipients are conditioned to respond to certain names and titles without scrutinising the request itself. In the dataset, all sender names and organisations are plausible rather than obviously spoofed, which removes the low-effort check of looking for misspelled brand names.
Credential Harvest
Classic credential phishing: an email directing the recipient to a login page, verification flow, or account recovery process. These messages are the backbone of most real-world phishing campaigns because they work. The dataset focuses on the email layer, not the destination. Cards present the message itself and reveal forensic signals (SPF/DKIM/DMARC status, reply-to analysis, URL characteristics) after the player classifies it. The goal is to test whether players can detect the phishing intent from the message alone, before they ever click.
Hyper-personalization
Emails that reference contextually plausible personal or professional detail to establish authenticity. These might reference a recent purchase, a shared connection, a project name, an industry, or a role-specific process. The technique exploits the cognitive shortcut of recognising familiar context as a legitimacy signal. Hyper-personalized phishing is expensive to produce at scale with human writers, but AI makes it trivially cheap. This category tests whether the presence of relevant-sounding context meaningfully lowers detection rates.
Pretexting
Multi-step social engineering that establishes a believable backstory before making the ask. The email arrives as part of an implied ongoing interaction: a follow-up to a meeting that may or may not have happened, a response to a request the recipient may or may not remember making, a continuation of a vendor relationship. The pretext does the work. The request itself is often mundane. Detection requires recognising the setup as artificial rather than evaluating the request on its own terms.
Fluent Prose
Phishing with no urgency cues, no authority figure, no personalization, and no pretext. Just polished, neutral email language making a request. This is the hardest category to classify because it removes every conventional heuristic simultaneously. The email reads like a normal business communication. The study hypothesis is that fluent prose phishing will have the highest bypass rate precisely because it offers nothing obvious to flag. If that hypothesis holds, it has significant implications for how security awareness training frames the "what to look for" question.
Methodology
Game modes
The platform requires account creation via email one-time password (no persistent password is stored). All participants begin in Research Mode, which contributes classified answers to the study dataset and is capped at 30 answers (three sessions of ten cards each). Sessions draw 10 cards via uniform random selection from each participant's remaining pool without stratification, so technique representation balances naturally at scale. After completing Research Mode, participants unlock Freeplay and Expert Mode. Freeplay uses separately generated cards that are not part of the research card pool. Freeplay data is not persisted to the study database.
Classification and confidence
For each card, players make two decisions: classification (phishing or legitimate) and confidence level. Confidence is expressed in three tiers:
| Level | Score multiplier | Interpretation |
|---|---|---|
| GUESSING | 1× | uncertain classification |
| LIKELY | 2× | moderate confidence |
| CERTAIN | 3× | high confidence |
Confidence data is recorded alongside correctness. This allows the study to measure calibration: whether players who report high confidence are actually more accurate, whether overconfidence clusters around specific techniques, and whether security professionals show better calibration than non-security users.
Data collected per answer
Research Mode answers are linked to a pseudonymous player UUID. Email addresses are held only in Supabase Auth and are never stored in research tables. The research tables record:
- Player UUID (pseudonymous, not linked to email outside auth)
- Game mode (research mode only)
- Card technique, difficulty tier, and correct classification
- Player answer, confidence level (GUESSING / LIKELY / CERTAIN), and input method (button or keyboard)
- Three timing measurements: total time from card render to answer, confidence-to-answer time, and confidence deliberation time (all in milliseconds)
- Forensic interaction telemetry: scroll depth percentage and whether URLs were inspected (header inspection was collected prior to panel removal; see protocol change below)
- Session context: answer ordinal position, running correct-answer streak, cumulative correct count, and session identifiers
Professional background
Players can optionally self-report their professional background on their profile. This field is used to compare bypass rates across groups and test whether security experience produces meaningfully better detection outcomes. The three options are:
- INFOSEC / CYBERSECURITYworking in security
- TECHNICAL / NON-SECURITYtechnical role outside security
- OTHERgeneral users, students, non-technical roles
- PREFER NOT TO SAYexcluded from group comparison analysis
Background is optional. Players who select “prefer not to say” are excluded from group comparison analysis while their answer data remains in the main dataset.
Forensic signals
After each round, players see a forensic signal breakdown for every card they classified. This serves a dual purpose: it functions as the learning layer of the game, and it trains players on real detection signals rather than just telling them the answer. The signals revealed are:
SPF / DKIM / DMARC (removed)
Authentication status for the sending domain was originally displayed during classification. The header inspection panel was removed mid-study after analysis showed it introduced a confound: participants with technical backgrounds used it as a shortcut rather than evaluating the email itself, which undermined the study's focus on technique-level detection. Authentication metadata remains in the dataset for post-hoc analysis. See the v1.1 addendum for full rationale.
Reply-To Mismatch
Whether the reply-to address differs from the from address. A common technique for harvesting replies without controlling the sending domain. Legitimate bulk email often uses separate reply-to addresses, so this requires contextual interpretation.
Send Timestamp Analysis
The time and timezone offset of the message. Emails sent at unusual hours or from unexpected timezone offsets can indicate automated sending infrastructure or a mismatch between the claimed organisation and the actual sender location.
URL Inspector
Tappable links that reveal destinations. Hovering or tapping a link in a real email is one of the most reliable quick checks available. The game simulates this to train the habit and to show players the gap between displayed anchor text and the actual URL.
Attachment Name Analysis
Where applicable, the filename and extension of attachments. Double extensions, unusual formats for the claimed document type, and names engineered to trigger opens are all represented in the dataset.
Protocol change: authentication header panel removal
The original platform design included an interactive panel where players could inspect email authentication headers (SPF, DKIM, DMARC status) during classification. This panel was removed mid-study as a deliberate protocol change, documented in the v1.1 addendum.
The rationale: the study is designed to measure human detection of phishing techniques, not header literacy. Early data showed that participants with technical backgrounds were using authentication status as a classification shortcut, bypassing the email content entirely. This created a confound where detection rates reflected header-reading ability rather than technique recognition. Removing the panel refocused the experiment on what it was designed to measure.
Authentication metadata is still stored in the dataset and remains available for post-hoc analysis. Answers collected before and after the panel removal are flagged in the data, allowing the change to be accounted for in analysis. The addendum documents the exact timing, affected data points, and analytical approach for handling the transition.
Expert Mode
After submitting 30 research answers (3 completed sessions), participants unlock Expert Mode. Expert Mode draws exclusively from separately generated extreme-tier cards (10 cards per technique, 60 cards total in the phishing pool) that are not part of the Research Mode pool, and awards double XP. Expert Mode answers are not included in the primary research dataset. Upon graduating to Expert Mode, participants can no longer contribute new data to the study; they continue to play for engagement and review personal statistics, but subsequent classifications are excluded from analysis. This design prevents experienced participants from skewing the research dataset with repeated exposure effects. Expert Mode engagement data is tracked separately for potential secondary analysis of skill progression.
Limitations
Self-selected sample
Participants are players who discovered Threat Terminal and opted into Research Mode. This is not a random sample of the general population. Results will over-represent people who are security-aware or curious about phishing, which likely biases detection rates upward compared to a general workforce sample.
Game context
Players know they are classifying emails in a game environment. This may produce different cognitive engagement than real-world email triage, where classification competes with other tasks and attention is not guaranteed. Game context may inflate detection rates by focusing attention on the task.
AI-generated cards
All cards, including legitimate ones, are AI-generated. This produces a controlled dataset but means the legitimate emails do not carry the full contextual richness of real correspondence. In practice, recipient-specific context (knowing the sender, expecting the email, recognising internal references) is a strong legitimacy signal that the dataset cannot replicate.
Self-reported background
Professional background is self-reported and not verified. Players may misclassify their background or select options that do not accurately reflect their day-to-day exposure to security concepts.
Hyper-personalization ceiling
Hyper-personalization cards use plausible contextual detail rather than genuine information about the specific participant. The study measures recognition of hyper-personalization as a technique structure, not the effectiveness of genuine personalization. In-study bypass rates for this technique will not reflect its real-world ceiling.
Card classification reliability
Each card is assigned a technique label and difficulty tier by the author alone. No second coder independently classified the cards, and no inter-rater reliability metric is reported. Technique assignment is determined at generation time by the prompt specification, and the admin review pipeline rejects cards that do not clearly instantiate the specified technique, but a formal reliability audit with an independent coder is planned prior to the empirical findings paper.
Within-study learning effects
Participants receive forensic signal breakdowns after each 10-card session, meaning classifications in sessions two and three are informed by feedback from prior sessions. This pedagogical feedback is integral to the platform but introduces a potential learning confound. The planned analysis includes session order as a covariate to estimate the magnitude of within-study learning.
Base rate awareness
The 69/31 phishing-to-legitimate ratio is a design choice to ensure statistical power and prevent gaming, not a reflection of real-world base rates where the overwhelming majority of messages are legitimate. Absolute detection rates should be interpreted in context rather than as estimates of real-world performance.
Single-family generation
The dataset comprises cards generated by two models from the Anthropic Claude family (Haiku 4.5 and Sonnet 4.6). While using two models introduces stylistic variation, both share underlying training characteristics. Per-card model provenance is recorded, enabling secondary analysis of whether generation model affects detection rates.
Hypotheses
These hypotheses were formed before data collection began and are stated here to distinguish predictions from post-hoc rationalisations once results are available.
Highest bypass rate among named techniques
Among the five named technique categories, Pretexting is expected to produce the highest bypass rate. Pretexting conceals malicious intent inside a plausible narrative; the request itself appears mundane, requiring detection of the surrounding setup rather than the content of the ask. Fluent Prose, which removes every conventional social engineering mechanism simultaneously, is expected to produce a high bypass rate as well, but serves a structurally distinct role as a baseline measuring detection failure when no identifiable trick is present. The empirically informative comparisons are the relative ordering among the five named techniques and the gap between those techniques and the Fluent Prose baseline.
Hyper-personalization deserves a separate note here. In real-world deployments, it would likely be the most effective technique of all: an email that references your actual name, role, recent activity, or known colleagues is substantially harder to dismiss than a generic message. Cards labelled hyper-personalization use plausible contextual detail rather than detail drawn from the specific player seeing the card. This is a deliberate scope boundary, not a flaw in the design. The study is measuring recognition of technique structure: can players identify that an email is attempting to exploit personal familiarity, regardless of whether it references their actual details? That is a separable question from whether real-world personalisation is effective, and it is the question this dataset is built to answer. In-study bypass rates for hyper-personalization will not reflect its real-world ceiling, but they were never intended to.
Lowest bypass rate
Credential Harvest is expected to be the most detectable technique. It is the attack pattern most consistently covered in security awareness training, and players are conditioned to scrutinise login prompts and link destinations more than any other element of an email. Even in a controlled environment where prose quality is held constant, the structural fingerprint of credential phishing is recognisable: there is always an ask to authenticate somewhere.
Group differences
Security professionals (INFOSEC group) are expected to outperform both technical non-security users and general users in overall detection rate. Daily exposure to threat patterns, incident reports, and phishing simulations should produce better intuition across most technique categories. The more interesting question is whether that advantage is uniform or concentrated: security professionals may show dramatically better detection on some techniques while performing comparably to other groups on techniques that exploit cognitive shortcuts rather than technical knowledge.
Confidence calibration
Players will be overconfident when wrong. Incorrect classifications are expected to skew toward LIKELY and CERTAIN rather than GUESSING, meaning players will not just miss phishing emails but will miss them while feeling sure they are right. This pattern is expected to cluster on techniques that produce the most plausible-looking output: pretexting and fluent prose. If a well-constructed pretext reads like a normal email, the player who misclassifies it has no signal telling them they should be uncertain. That confident wrongness is a meaningful finding in its own right, separate from raw bypass rates.
Status
Data collection ongoing. A findings report is in progress and will be published once the study reaches 100 participants.
Live findings are published at research.scottaltiparmak.com/intel and update in real time as data comes in. A formal write-up will be submitted for peer consideration as the dataset matures. If you are a researcher interested in collaborating, reviewing methodology, or discussing the data, get in touch.
Participate
All participants begin in Research Mode after creating a free account (email OTP, no password). Each session is 10 cards and takes about five minutes. After completing 30 research answers (three sessions), Freeplay and Expert Mode unlock.